Extracting Best Consensus Motifs from Positive and Negative Examples
نویسندگان
چکیده
We define the best consensus motif (BCM) problem motivated by the problem of extracting motifs from nucleic acid and amino acid sequences. A type over an alphabet C is a family R of subsets of C*. A motif ;.r of type R is a string ;.r = 7~1 . ;.rn of motif components, each of which stands for an element in R. The BCM problem for R is, given a yes-no sample S = {(a('), @('I), . . . , (dm) , dm))} of pairs of strings in C* with a(') f for 1 < i < m, to find a motif rr of type R that maximizes the number of good pairs in S, where (a('), @(')) is good for ;.r if .rr accepts a(') and rejects ,di). We prove that the BCM problem is NP-complete even for a very simple type R1 = {t 1 fJ # t. C C}, which is used, in practice, for describing protein motifs in the PROSITE database. We also show that the NP-completeness of the problem does not change for the type R, = O1 U {C+} U {C['jl I 1 < i < j}, where C['7jl is the set of strings over C of length between i and j . Furthermore, for the BCM problem for R1, we provide a polynomial-time greedy algorithm based on the probabilistic method. Its performance analysis shows an explicit approximation ratio of the algorithm.
منابع مشابه
Algorithms for Extracting Structured Motifs Using a Suffix Tree with an Application to Promoter and Regulatory Site Consensus Identification
This paper introduces two exact algorithms for extracting conserved structured motifs from a set of DNA sequences. Structured motifs may be described as an ordered collection of p > or = 1 "boxes" (each box corresponding to one part of the structured motif), p substitution rates (one for each box) and p - 1 intervals of distance (one for each pair of successive boxes in the collection). The con...
متن کاملDiscriminative Detection of Transcription Factor Binding Sites from Location Data
The interaction between transcription factors (TFs) and their DNA binding sites (motifs) plays a key role for understanding gene regulation mechanisms. While a number of methods have been proposed previously, most of them search for statistically over-represented patterns in the upstream sequences of clustered, and presumably co-regulated, groups of genes. On the other hand, genomewide location...
متن کاملReliable Negative Extracting Based on kNN for Learning from Positive and Unlabeled Examples
Many real-world classification applications fall into the class of positive and unlabeled learning problems. The existing techniques almost all are based on the two-step strategy. This paper proposes a new reliable negative extracting algorithm for step 1. We adopt kNN algorithm to rank the similarity of unlabeled examples from the k nearest positive examples, and set a threshold to label some ...
متن کاملAn Analytical Study on Calligraphic, Human and Vegetal Motifs in Some Examples of Enameled Glasses in Egypt and Syria (Mamluke Period) in Comparison with Iranian Metalwork (Ilkhanid and Timurid Periods)
Throughout history, artworks in the field of metalwork and glasswork reflect different themes. They are considered as important means of manifesting Islamic art and traditional crafts in different countries which have been producing a wide variety of art products. Meanwhile, the influence of some kinds of artworks from different lands and the counterinfluence of concepts and artistic themes amo...
متن کاملConsensus Control for Multi-agent Systems with a Faulty Node
This paper studies consensus control for a multi-agent system with a faulty node. The node dynamics follow a continuous-time consensus protocol with negative feedback from the relative state of the neighbors, where the faulty node is instead using positive feedback from the state. Conditions for reaching consensus are established, and a fault threshold is introduced. Numerical examples investig...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1996